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Abstract — A new inner bound on the capacity region of a 
general index coding problem is established. Unlike most existing 
bounds that are based on graph theoretic or algebraic tools, 
the bound is built on a random coding scheme and optimal 
decoding, and has a simple polymatroidal single-letter expression. 
The utility of the inner bound is demonstrated by examples that 
include the capacity region for all index coding problems with 
up to five messages (there are 9846 nonisomorphic ones). 

I. Introduction 

Consider the simple communication problem in Figure 1, 
which is often referred to as the index coding problem. The 
sender wishes to communicate N messages Mj G [1 : 2 nRj ], 
j G [1 : N], to their respective receivers over a common 
noiseless link that carries n bits X n . Each receiver j G [1 : N] 
has prior knowledge of M4,, i.e., a subset Aj C [1 :N] \ {j} 
of the messages. Based on this side information Mjy. and 
the received bits X n , receiver j finds the estimate Mj of the 
message Mj, A nontrivial tradeoff arises between the rates Rj, 
j G [1 : N], of the messages since they compete for the shared 
broadcast medium for receivers with incompatible knowledge. 
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lim„_ ) . 00 Pi™' = 0. The capacity region ^ of the index coding 
problem is the closure of the set of achievable rate tuples 
(i?i, . . . ,Rn)- The goal is to find the capacity region and the 
optimal coding scheme that achieves it. 

Note that an index coding problem is fully characterized 
by the side information sets Aj, j G [1 : N], As an example, 
consider the 3-message index coding problem with Ai = {2}, 
•A.2 = {1,3}, and A3 = {1}. We represent this problem 
compactly as 

(1|2), (2|1,3), (3|1), (1) 

or as a directed graph (see Figure 2(a)), where nodes rep- 
resent indices of the messages/receivers and edges represent 
availability of side information (e.g., the edge 1 — > 2 means 
that side information Mi is available at receiver 2). Note that 
this 3-message index coding problem can be represented as 
an instance of the network coding problem [1] as illustrated 
in Figure 2(b). The same observation can be made for any 
index coding problem; thus, index coding is a special case of 
network coding. 




Fig. 1. The index coding problem. 



(b) 



We define a (2 nRl , . . . , 2 nRN , n) code for index coding by 
an encoder x n (mi, . . . ,mj\r) and N decoders rhj(x n .), 
j G [1 : N}. We assume that the message tuple (Mi, . . . , Mjv) 
is uniform over [1 : 2 nRl ] x • • • x [1 : 2 nRN ], that 
is, the messages are uniform and independent of each 
other. The average probability of error is then defined as 
Pj" } = P{(Mi, ...,M N ) £ (Mi, . . . , M N )}. A rate tu- 
ple (Ri,...,Rn) is said to be achievable if there ex- 
ists a sequence of (2 nRl , . . . , 2 uRn , n) codes such that 



Fig. 2. (a) A directed graph representation, (b) The equivalent network coding 
problem. Here every edge of the graph can carry up to 1 bit per transmission. 

First introduced by Birk and Kol [2] in the context of 
satellite broadcast communication, the index coding problem 
has been studied extensively over the past six years in the 
theoretical computer science and network coding communities 
with many exciting contributions of combinatorial and alge- 
braic flavors (see, for example, [3]— [15] and the references 



therein). Our Shannon-theoretic formulation of the problem 
closely follows that of Maleki, Cadambe, and Jafar [16], who 
established the capacity region for several interesting classes 
of index coding problems using interference alignment [17]. 
Despite all these developments, the capacity region of a 
general index coding problem is not known. 

Confirming Maslow's axiom [18] "if all you have is a 
hammer, everything looks like a nail," we propose a random 
coding approach, replacing more advanced coding schemes of 
an algebraic nature. This approach is more in the spirit of the 
original paper by Ahlswede, Cai, Li, and Yeung [1], where 
random coding (binning) was used to establish the network 
coding theorem. In particular, we develop a composite coding 
scheme based on random coding and establish a corresponding 
single-letter inner bound on the capacity region. 

Instead of mechanical proofs, this paper focuses on basic 
intuitions behind our coding scheme, which we develop gradu- 
ally from simpler coding schemes — "flat coding" in Section III 
and "dual index coding" in Section IV. The composite coding 
scheme is explained in Section V. The next section discusses 
known outer bounds on the capacity region. 

II. Outer Bounds 

We first recall the following outer bound on the capacity 
region, which is a simple consequence of Fano's inequality 
and the submodularity of entropy; see, for example, [9] (or 
[19] for a similar bound in the context of a general network 
coding problem). 

Theorem 1: Let Bj — [1 : N] \ ({j} U Aj) be the index set 
of interfering messages. If (i?i, . . . ,Rn) is achievable, then 
it must satisfy 

Rj<T {j}uBj -T Bj , j€[l:N], 

for some Tj, J C [1 : N], such that 

1) T = 0, 

2) T[]..jv] = 1, 

3) for all J C K, Tj < T K , and 

4) for all J and K, T Jnlc + T JUK <Tj + T K . 

It is not known whether this outer bound is tight in general. 
Sometimes a relaxed version of the bound is handy. 

Corollary 1: Let Q = ([1 : N], £) be a directed graph 
representation of the index coding problem (j\Aj), j G [1 : N], 
that is, V = [1 : N] and (j, k) € £ iff j 6 A k . If (R u . . . , R N ) 
is achievable, then it must satisfy 

E R ^ 1 

for all J C [1 : N] such that the subgraph of Q over J does 
not contain a directed cycle. 

The following example, due to [14], [16], illustrates that the 
two outer bounds do not coincide in general. 

Example 1: Consider the symmetric five-message index 
coding problem (J \j — l,j + 1), j £ [1:5], namely, 




Fig. 3. A graph representation of the 5-message index coding problem. 

The corresponding graph representation is depicted in Fig- 
ure 3. Applying Corollary 1, we obtain 

R1+R3 < 1, 
R2 + Ri < 1, 

R 3 + R 5 <1, (2) 
R4 + R1 < 1, 
R 5 + R2< 1. 
In comparison, Theorem 1 leads to the inequality 

Ri + R 2 + R 3 + i? 4 + R5 < 2, (3) 

in addition to the above five inequalities. As we discuss 
in Section V, the resulting six inequalities characterize the 
capacity region of the index coding problem. 

III. Flat Coding 

Consider the following simple random coding scheme. For 
each (mi, . . . , m N ) g [1 : 2 nRl ] x • • ■ x [1 : 2 ,iRn ], generate 
a codeword x n (mi, . . . , toat) randomly and independently as 
a Bern(l/2) sequence. To communicate (mi, . . . , mjv), the 
sender transmits x n ~ x n {m\, . . . , mjv). Receiver j uses 
simultaneous nonunique decoding [20] and finds the unique 
rhj S [1 : 2 nRj ] such that x n (rhj, mj, . , mg ) is jointly typical 
with (i.e., identical to) the received sequence x n for some 
mg , where Bj = [1 : N] \ ({j} U Aj). Since the codebook 
generation is "flat" (compared with "layered" superposition 
coding), simultaneous nonunique decoding is essentially iden- 
tical to performing the unique decoding of (rhj , mg. ) and then 
discarding the unnecessary part mg. . 

This "flat coding" scheme achieves the following inner 
bound. 

Proposition 1: A rate tuple (i?i, . . . , Rn) is achievable for 
the index coding problem (j\Aj), j G [1 : N], if 

Rj+J2 Rk<1 ' J e i 1:N ^- 

keBj 

As an example, consider the 3-message problem in (1). 
Under flat coding, receiver 1 finds the unique rh\ such that 
x n (rhi,ni2, m^) — x n for some 7713 G [1 : 2 nRa ] and the given 
side information 1712- By the packing lemma [21, Sec. 3.4], it 
can be readily shown that the probability of decoding error for 
receiver 1 tends to zero as n — > 00 if 



(1|5,2), (2|1,3), (3|2,4), (4|3,5), (5|4,1). 



Rx+R 3 < 1. 



(4) 



Similarly, we obtain R 2 < 1 (inactive) and 

R 2 + R 3 <1. (5) 

By comparing with Theorem 1 (or Corollary 1), it can be 
easily checked that the rate region formed by (4) and (5) is 
indeed the capacity region. 

It can be easily verified that for all index coding problems 
with 1, 2, and 3 messages — there are 1, 3, and 16 noni- 
somorphic problems [22] — flat coding achieves the capacity 
region. More generally, among 218 four-message index coding 
problems, time sharing of flat coding over subsets of messages 
achieves the capacity region for all but three. The following 
is one of the three exceptions. 

Example 2: Consider the 4-message index coding problem 

(1|4), (2|3,4), (3|1,2), (4|2,3). 

On the one hand, flat coding yields an inner bound on 
the capacity region that consists of the rate quadruples 
R2, R 3 , Ri) such that 

Ri + R 2 + R 3 < 1, 
i?l + i?4 < 1, 
R 3 + Ri < 1. 

It can be verified that this inner bound cannot be improved 
upon by time sharing over subsets. On the other hand, Theo- 
rem 1 yields an outer bound that consists of the rate quadruples 
(Ri, R 2 , -R3, R4) such that 

Ri+R 2 < 1, 
Ri+R 3 <l, 

(6) 

Ri + Ri < 1, 
R 3 + Ra < l. 

We will see in Section V that this outer bound is tight. 

While flat coding is suboptimal in general, the analysis (i.e., 
the proof of Proposition 1) is trivial and does not rely on 
any graph theoretic machinery. This observation will become 
crucial when we generalize the coding scheme subsequently. 

IV. Dual Index Coding 

Before we move on to a more powerful random coding 
scheme, we introduce a communication problem (depicted in 
Figure 4) that is, in some sense, dual to the index coding 
problem. Here a set of (2 N — 1) senders wish to communicate 
a message tuple (Mi, . . . , Mn) to a common receiver, each 
encoding a subtuple Mj into a separate index Wj G [1 : 
2 nS J] for J C [1 : N] nonempty. What is the capacity region 
(as a function of the rates Sj)l 

This problem turns out to be a special case of the general 
multiple access channel (MAC) with correlated messages 
studied by Han [23]. For the general MAC, superposition 
coding achieves the capacity region that is characterized by 
independent auxiliary random variables U\,..., Un, each cor- 
responding to a message. However, for the dual index coding 
problem, we can characterize the capacity region explicitly. 
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Fig. 4. The dual index coding problem. 



Proposition 2: The capacity region of the dual index coding 
problem is the set of rate tuples . . . , Rjy) such that 

jej J'C[l:N]:J'nJ^Hl 

for all J C [1:N]. 

What is perhaps more important than this explicit character- 
ization of the capacity region is the fact that it can be achieved 
by flat coding, which we will utilize later. 

As an example, consider the three-message three-sender 
dual index coding problem in Figure 5, where Si 2 = 1 an d 
Si^ = 51,2,3 = 2. By (7), the capacity region is the set of 
rate triples (Ri, R 2 , R 3 ) such that 

Ri + R 2 + R 3 < 5, 

i? 2 < 3, (8) 
R 3 < 4. 

This can be achieved via flat coding of (Mi, M2), (Mi, M 3 ), 
and (Mi,M2,M3), respectively, and simultaneous decoding 
at the receiver. 
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Fig. 5. An example for dual index coding. 



V. Composite coding 

Equipped with the results in the previous two subsections, 
we now introduce another random coding scheme, which we 
refer to as composite coding. This is best described by an 
example. 

We revisit the 5-message problem [14], [16] in Example 1. 
As the first step of composite coding, the sender encodes 
(Mi, M2) into an index W\ y2 at rate Si, 2 using random cod- 
ing, and similarly encodes (M 2 ,M 3 ), (M 3 ,M 4 ), (M 4 ,M 5 ), 
and (Mi,M$), respectively, into indices W2,3, W 3t 4, Wi$, 



and W1.5. Equivalently, we decompose the sender into 5 
"virtual" senders, each mapping one of the above pairs of 
messages (as in the dual index coding problem). As the 
second step, the sender uses flat coding to communicate the 
"composite" indices Wi,2, ^2,3, ^3,4, 1^4,5, Wi,5. As with 
encoding, decoding also takes two steps. Each receiver first 
recovers the composite indices and then recovers the desired 
message from the composite indices. For example, receiver 1 
recovers Wi, 2,^1,5 (and other composite indices too). Now 
since it has side information (M2,Ms), it can recover Mi 
from (Wi,2,Wi,5) at rate Si, 2 + Si, 5- Following similar 
steps for other receivers and incorporating the flat coding 
rate condition, it can be easily verified that the rate quintuple 
(Ri, R2, i?3, -R4, R5) is achievable if 

R\ < Si, 2 + Si, ,5, 

R2 < Si, 2 + S2,3, 
R3 < S-2,3 + S3, 4, 
Ri < S3, 4 + S4,5, 
R5 < Sx,5 + S4.5 

for some (Si, 2 , S 2 ,3, S 3 , 4 , S 4 , 5 , Si, 5) such that Si, 2 + S 2 ,3 + 
S3,4 + S4,5 + Si,5 < 1. Fourier-Motzkin eliminiation [21, Ap- 
pendix D] of the composite index rates yields the inequalities 
in (2) and (3) in the outer bound, establishing the capacity 
region. 

As another example, we revisit the four-message problem 
in Example 2. In this case, we use the composite indices Wi 4 
and Wi,2,3,4 of rates S2.3 and Si, 2, 3, 4, respectively. Then, it 
can be easily verified from Proposition 2 that receiver 1 can re- 
cover Mi if Ri < Si, 4; receiver 2 can recover M2 (and Mi as 
well) if R1+R2 < Si,4 + Si,2,3,4 and i? 2 < Si, 2, 3, 4; receiver 3 
can recover M3 (and M4 as well) if R3 + R4 < Si,4 + Si,2,3,4 
and i?3 < Si, 2. 3, 4; and receiver 4 can recover M4 (and Mi as 
well) if R\ + i?4 < Si, 4 + Si. 2, 3. 4- By eliminating Si. 4 and 
Si, 2, 3, 4 under the constraint Si. 4 + Si. 2, 3, 4 < 1, we obtain the 
same set of inequalities as in the outer bound (6), establishing 
the capacity region. 

In general, we can utilize (2 — 1) virtual senders to encode 



N messages. Moreover, the receivers can employ simultaneous 
nonunique decoding for the second-step decoding (or equiva- 
lently, ignore some composite indices in an optimal manner). 
This coding scheme is illustrated in Figure 6. 

To characterize the performance of the composite coding 
scheme, for each (Sj: J C [1 : N]), we define the polyma- 
troidal region ffl{JC \ JC') as the set of rate tuples (R\, . . . , Rn) 
such that 

jej j'CKuK'-.j'nj^ii 

for all J C JC \ JC' . This region corresponds to the capacity 
region of the dual index coding problem (Proposition 2) for 
the desired message set JC with side information JC' . We are 
now ready to state the main result of the paper. 

Theorem 2 ( Composite-coding inner bound): A rate tuple 
(i?i, . . . , Rn) is achievable for the index coding problem 

(j\Aj), j e [1 :N], if 

(R u ...,R N )e p| IJ SS{K.\Aj) (10) 

je[l:N] KC[l:N]:j£lC 

for some (Sj: JC [1 : N]) such that J2j-j<£a- $J - 1 for 
all j G [1:N]. 

At a first glance, composite coding seems to be time sharing 
of flat coding over all subsets of [1 : N}. However, it employs 
the optimal decoding rule that utilizes all composite indices 
(subsets) that are relevant to the desired message. As such, 
the corresponding rate region has a very similar form as the 
optimal rate region for interference networks with random 
coding [24]. 

Using the polco tool for polyhedral computations [25], we 
have computed the composite-coding inner bound and the 
outer bound in Theorem 1 for all 9608 nonisomorphic five- 
message index coding problems [22]. In all cases, both bounds 
agree, establishing the capacity region. 

To further demonstrate the utility of composite coding, we 
revisit the following example in [16]. 
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Fig. 6. Composite coding scheme. 



Example 3: Consider the TV-message symmetric index cod- 
ing problem 

{j\j-D,j-D + l,...,j-l,j + l,...,j + U) 

for j G [1 : N]. For example, the 5-message problem in 
Example 1 is a special case of this problem with N = 5 
and D = U = 1. We assume without loss of generality that 
< U < D < N - U - 1. Set Sj = 1/(N - (D - U)) 
if J is of the form [k : k + U}. Set Sj = otherwise. 
Since receiver j e [1 : N] has Mj +1 , . . . , Mj + d as side 
information, it already knows the (D — U) composite in- 
dices W[ j+1:j+1+U ] W[ j+D - U:j+D] . Thus, there are only 
N — (D — U) composite indices that needs to be recov- 
ered from x n , which is feasible since J2j-j^a- = !■ 
Now receiver j can recover Mj from the composite indices 
W[j- U:j ],. . . , Wfrj+u], provided that 

Rj < Sy_ U:j ] H h Sy. j+U ]. 

Hence, the symmetric rate of (U + l)/(N — D + U) is 
achievable. In [16] it is shown that this symmetric rate is in 
fact optimal, which can be also verified directly by the outer 
bound in Theorem 1. For N = 6, U =1, and D = 2, that is, 

(1|2,5,6),(2|1,3,6),(3|1,2,4), 
(4 1 2, 3, 5) , (5 1 3, 4, 6) , (6 1 1, 4, 5) , 

the symmetric rate of 2/5 is optimal. In fact, simplifying 
Theorems 1 and 2 yields the capacity region that consists of 
the rate sextuple (i?i, . . . , R§) such that 

Rj+R j+ 2<1, JG[1:6], 
Rj +R j+3 <1, ie[l:6], 
Rj + II j . + R J+ 2 + Rj+s + Rj+4 < 2, j e [1 : 6] . 

In particular, this region is achievable by using composite 
indices Wi, W 2 , W 3 , W A , W 5 , W 6 , Wi, 2 , W 2 , 3 , W 3A , W 4 , 5 , 
W bfi , and TVi^. 

VI. Concluding Remarks 

Based on a first principle in Shannon's random coding, this 
paper has established the composite-coding inner bound on 
the general index coding problem. This inner bound is simple, 
easy to compute, yet is powerful and tight for all index coding 
problems of up to five messages as well as many existing 
examples. In a sense, random coding is a "jackknife" rather 
than a "hammer." 

The polymatroidal structure of the composite-coding inner 
bound and the submodularity of the outer bound suggest a 
deeper connection rooted in matroid theory [19], [26]. In 
addition to evaluating the inner and outer bounds for more 
examples (there are 1540944 nonisomorphic six-message in- 
dex coding problems), future studies will focus on analyzing 
the algebraic structures of these bounds to investigate what 
lies in the path to establishing the capacity region of a general 
index coding problem. 
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